Vietnamese to Chinese Machine Translation via Chinese Character as Pivot
نویسندگان
چکیده
Using Chinese characters as an intermediate equivalent unit, we decompose machine translation into two stages, semantic translation and grammar translation. This strategy is tentatively applied to machine translation between Vietnamese and Chinese. During the semantic translation, Vietnamese syllables are one-by-one converted into the corresponding Chinese characters. During the grammar translation, the sequences of Chinese characters in Vietnamese grammar order are modified and rearranged to form grammatical Chinese sentence. Compared to the existing single alignment model, the division of two-stage processing is more targeted for research and evaluation of machine translation. The proposed method is evaluated using the standard BLEU score and a new manual evaluation metric, understanding rate. Only based on a small number of dictionaries, the proposed method gives competitive and even better results compared to existing systems.
منابع مشابه
A Character Level Based and Word Level Based Approach for Chinese-Vietnamese Machine Translation
Chinese and Vietnamese have the same isolated language; that is, the words are not delimited by spaces. In machine translation, word segmentation is often done first when translating from Chinese or Vietnamese into different languages (typically English) and vice versa. However, it is a matter for consideration that words may or may not be segmented when translating between two languages in whi...
متن کاملImproving Arabic-Chinese Statistical Machine Translation using English as Pivot Language
We present a comparison of two approaches for Arabic-Chinese machine translation using English as a pivot language: sentence pivoting and phrase-table pivoting. Our results show that using English as a pivot in either approach outperforms direct translation from Arabic to Chinese. Our best result is the phrase-pivot system which scores higher than direct translation by 1.1 BLEU points. An error...
متن کاملMachine Translation between Uncommon Language Pairs via a Third Common Language: The Case of Patents
This paper proposes to familiarize the MT users with two major areas of development: (1) To improve translation quality between uncommon language pairs, the use of a third language as the pivot. Various techniques have been shown to be promising when parallel corpora for the uncommon language pairs are not readily available. They require the use of two other language pairs involving a common th...
متن کاملThe NICT/ATR speech translation system for IWSLT 2008
This paper describes the National Institute of Information and Communications Technology/Advanced Telecommunications Research Institute International (NICT/ATR) statistical machine translation (SMT) system used for the IWSLT 2008 evaluation campaign. We participated in the Chinese– English (Challenge Task), English–Chinese (Challenge Task), Chinese–English (BTEC Task), Chinese–Spanish (BTEC Tas...
متن کاملA Novel Approach for Handling Unknown Word Problem in Chinese-Vietnamese Machine Translation
For languages where space cannot be a boundary of a word, such as Chinese and Vietnamese, word segmentation is always the task to be done first in a statistical machine translation system (SMT). The word segmentation increases the translation quality, but it causes many unknown words (UKW) in the target translation. In this paper, we will present a novel approach to translate UKW. Based on the ...
متن کامل